amnesia-ab CONFIRMED + Ollama backend adapter + v0.1.1 prep#67
Merged
Conversation
The cheapest discriminating test of the central loop (spec row 9, library- driven retrieval) as a faithful miniature: mxbai-embed-large cosine top-5 -> inject -> llama3 8B Q4 via Ollama, 62-memory corpus w/ cross-project distractors, 20 tasks, objective key-fact scoring. Measured (results/run-2026-06-11T20-32-21.json): memory_on_fact_recall_pct 94.2 (confirm >=70) retrieval_hit_rate_pct 100.0 (confirm >=80) memory_off_fact_recall_pct 2.5 (sanity <=25 — corpus not guessable) latency p50 on/off 19.5s / 12.3s OFF-arm failure mode is confident fabrication, not ignorance — the memory loop is the difference between correct specifics and plausible lies on an 8B model. Per the decision rule this justifies: the Ollama backend adapter, activating mem0-v3-locomo, and cutting v0.1.0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… Ollama users Third InferenceBackend: bridges OCM to an existing Ollama daemon via its native NDJSON /api/chat API (model tag required per-request; max_tokens maps to options.num_predict; health via /api/tags). Selector untouched — explicit construction for now; daemon settings wiring is the follow-up. Parser test fixtures are VERBATIM captures from a live Ollama daemon (llama3, 2026-06-11) — pinned to the real wire format. Motivated by the amnesia-ab sandbox CONFIRMED verdict (94.2% fact recall on this exact daemon + model class). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… blocker resolved The 'five model SHA256 hashes' pre-release blocker was cleared when the unhashed Qwen3 entries were dropped (#50); the shipping registry is 3 models, all hashed. README still claimed 5 GGUFs + an open blocker. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ose + bench.py The dry-run validator requires docker-compose.yml + bench.py for ACTIVE sandboxes (caught by Bench Framework CI on PR #67 — the framework doing its job). bench.py delegates to run.mjs (ONE harness, the exact artifact that produced the CONFIRMED result); compose runs it in node:22-slim against the HOST Ollama daemon via host-gateway, same host-dependency pattern as vllm-q4-llama8b. run.mjs now honors OLLAMA_URL. Local validate_compose: PASS. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…row 9 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's in here (5 commits on top of main)
e4613de, pre-existing on this branch)bench/isolation/memory/amnesia-ab— first memory sandbox to RUN, verdict CONFIRMED (87c19c8)ocm-inference(ad2162a) — native NDJSON/api/chat, health via/api/tags,max_tokens→num_predict; parser tests pinned to verbatim live-daemon captures. Selector untouched (daemon settings wiring is the follow-up).1121e21) — registry is 3/3 SHA256-verified since chore: drop unhashed Qwen3 entries; spec blockers cleared #50; the '5 GGUFs / open hash blocker' claims were stale.5c60cb5) —docs/release-notes/v0.1.1-draft.md.Verification
bench/isolation/memory/amnesia-ab/results/run-2026-06-11T20-32-21.json🤖 Generated with Claude Code